Skip to content

Conversation

@ahkcs
Copy link
Contributor

@ahkcs ahkcs commented Nov 10, 2025

Description

Originally from #4056 by @selsong

This PR implements a significant performance optimization for the reverse command by eliminating the expensive ROW_NUMBER() window function and implementing a three-tier logic based on query context.

Motivation

The previous implementation used ROW_NUMBER() window function which:

  • Required materializing the entire dataset
  • Caused excessive memory usage
  • Failed on large datasets (100M+ records) with "insufficient resources" errors

Solution: Three-Tier Reverse Logic

The reverse command now follows context-aware behavior:

  1. With existing sort/collation: Reverses all sort directions (ASC ↔ DESC)
  2. With @timestamp field (no explicit sort): Sorts by @timestamp in descending order
  3. Without sort or @timestamp: The command is ignored (no-op)

Implementation Details

1. Reverse with Explicit Sort (Primary Use Case)

Query:

source=accounts | sort +balance, -firstname | reverse

Behavior: Flips all sort directions: +balance, -firstname-balance, +firstname

Logical Plan:

LogicalSystemLimit(sort0=[$3], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first], fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalProject(account_number=[$0], firstname=[$1], ...)
    LogicalSort(sort0=[$3], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
      CalciteLogicalIndexScan(table=[[OpenSearch, accounts]])

Physical Plan: (efficiently pushes reversed sort to OpenSearch)

CalciteEnumerableIndexScan(table=[[OpenSearch, accounts]],
  PushDownContext=[[..., SORT->[
    {"balance": {"order": "desc", "missing": "_last"}},
    {"firstname.keyword": {"order": "asc", "missing": "_first"}}
  ], LIMIT->10000]])

2. Reverse with @timestamp (Time-Series Optimization)

Query:

source=time_series_logs | reverse | head 100

Behavior: When no explicit sort exists but the index has an @timestamp field, reverse automatically sorts by @timestamp DESC to show most recent events first.

Use Case: Common pattern in log analysis - users want recent logs first

Logical Plan:

LogicalSystemLimit(sort0=[$0], dir0=[DESC], fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalProject(@timestamp=[$0], category=[$1], value=[$2])
    LogicalSort(sort0=[$0], dir0=[DESC])
      CalciteLogicalIndexScan(table=[[OpenSearch, time_data]])

3. Reverse Ignored (No-Op Case)

Query:

source=accounts | reverse | head 100

Behavior: When there's no explicit sort AND no @timestamp field, reverse is ignored. Results appear in natural index order.

Rationale: Avoid expensive operations when reverse has no meaningful semantic interpretation.

Logical Plan:

LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalProject(account_number=[$0], firstname=[$1], ...)
    CalciteLogicalIndexScan(table=[[OpenSearch, accounts]])

Note: No sort node is added - reverse is completely ignored.


4. Double Reverse (Cancellation)

Query:

source=accounts | sort +balance, -firstname | reverse | reverse

Behavior: Two reverses cancel each other out, returning to original sort order.

Logical Plan:

LogicalSystemLimit(sort0=[$3], sort1=[$1], dir0=[ASC-nulls-first], dir1=[DESC-nulls-last], fetch=[10000])
  LogicalProject(account_number=[$0], firstname=[$1], ...)
    LogicalSort(sort0=[$3], sort1=[$1], dir0=[ASC-nulls-first], dir1=[DESC-nulls-last])
      CalciteLogicalIndexScan(table=[[OpenSearch, accounts]])

Final sort order matches original query: +balance, -firstname


5. Multiple Sorts + Reverse

Query:

source=accounts | sort +balance | sort -firstname | reverse

Behavior: Reverse applies to the most recent sort (from PPL semantics, last sort wins).

Logical Plan:

LogicalSystemLimit(sort0=[$1], dir0=[ASC-nulls-first], fetch=[10000])
  LogicalProject(account_number=[$0], firstname=[$1], ...)
    LogicalSort(sort0=[$1], dir0=[ASC-nulls-first])
      CalciteLogicalIndexScan(table=[[OpenSearch, accounts]])

Result: Only firstname sort is reversed (DESC → ASC). The balance sort is overridden by PPL's "last sort wins" rule.


Related Issues

Resolves #3924

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: I recall the major comment on original PR is early optimization in analyzer layer. Is this new PR trying to address the concern? Ref: #4056 (comment)

@ahkcs
Copy link
Contributor Author

ahkcs commented Nov 11, 2025

QQ: I recall the major comment on original PR is early optimization in analyzer layer. Is this new PR trying to address the concern? Ref: #4056 (comment)

Hi Chen, I think that's a valid concern. However, after trying it out, I think it has significant complexity comparing to the current approach. I think CalciteRelNodeVisitor is used as a logical plan builder that constructs the logical representation of the query, so I think optimization can also happen here. In our approach, our visitReverse is choosing LogicalSort(reversed) vs LogicalSort(ROW_NUMBER), and I think this is appropriate for logical plan builder. If we moved the optimization to Calcite rule, we'd be doing something more complex - starting with a naive representation (always ROW_NUMBER) and rewriting it. That adds significant complexity.

@ahkcs ahkcs requested a review from dai-chen November 11, 2025 22:29
Copy link
Collaborator

@noCharger noCharger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add benchmark results on before VS after?

@LantaoJin LantaoJin added the enhancement New feature or request label Nov 12, 2025
Comment on lines 667 to 668
// Fallback: use ROW_NUMBER approach when no existing sort
RexNode rowNumber =
Copy link
Member

@LantaoJin LantaoJin Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea: How about ignore the reverse command if there is not existing collations or no IMPLICIT_FIELD_TIMESTAMP in rowType.
If there is IMPLICIT_FIELD_TIMESTAMP in rowType, no need to add ROW_NUMBER

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I think it makes sense since reverse command may not have meaningful semantics if there's no ordering.

Copy link
Contributor Author

@ahkcs ahkcs Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the implementation: Now reverse command is ignored if no collations/@timestamp found
Updated the test cases and documentation as well

This commit optimizes the `reverse` command in the Calcite planner by
intelligently reversing existing sort collations instead of always using
the ROW_NUMBER() approach.

Key changes:
- Added PlanUtils.reverseCollation() method to flip sort directions and
  null directions
- Updated CalciteRelNodeVisitor.visitReverse() to:
  - Check for existing sort collations
  - Reverse them if present (more efficient)
  - Fall back to ROW_NUMBER() when no sort exists
- Added comprehensive integration test expected outputs for:
  - Single field reverse pushdown
  - Multiple field reverse pushdown
  - Reverse fallback cases
  - Double reverse no-op optimizations

This optimization significantly improves performance when reversing
already-sorted data by leveraging database-native sort reversal.

Based on PR opensearch-project#4056 by @selsong

Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
@ahkcs ahkcs force-pushed the feat/reverse_optimization branch from b16218b to 39aecf6 Compare November 12, 2025 21:57
Signed-off-by: Kai Huang <[email protected]>
@ahkcs
Copy link
Contributor Author

ahkcs commented Nov 12, 2025

Can you add benchmark results on before VS after?

Test is done against big5_v2_first100m dataset(The first 100m docs of big5)

Before: Query failed to execute even with head 10

Query: source=big5_v2_first100m | sort +`@timestamp` | reverse | head <size>

  Running single_sort_reverse_10 (5 iterations)... ERROR: {
  "error": {
    "reason": "There was internal problem at backend",
    "details": "java.sql.SQLException: exception while executing query: insufficient resources to run the query, quit.",
    "type": "RuntimeException"
  },
  "status": 500
}

After:

=== Test 1: Single Field Sort + Reverse ===
Query: source=big5_v2_first100m | sort +`@timestamp` | reverse | head <size>

  Running single_sort_reverse_10 (5 iterations)... Done
  Running single_sort_reverse_100 (5 iterations)... Done
  Running single_sort_reverse_1000 (5 iterations)... Done

Size         Avg (ms)   P50 (ms)   P90 (ms)   Min (ms)   Max (ms)
---------- ---------- ---------- ---------- ---------- ----------
10                496        495        502        490        502
100               501        498        516        495        516
1K                577        576        586        571        586

=== Test 2: Single Field Sort DESC + Reverse ===
Query: source=big5_v2_first100m | sort -`@timestamp` | reverse | head <size>

  Running single_sort_desc_reverse_10 (5 iterations)... Done
  Running single_sort_desc_reverse_100 (5 iterations)... Done
  Running single_sort_desc_reverse_1000 (5 iterations)... Done

Size         Avg (ms)   P50 (ms)   P90 (ms)   Min (ms)   Max (ms)
---------- ---------- ---------- ---------- ---------- ----------
10                422        423        424        419        424
100               436        438        439        434        439
1K                515        507        548        503        548

=== Test 3: Multi-field Sort + Reverse ===
Query: source=big5_v2_first100m | sort +`host.name`, -`@timestamp` | reverse | head <size>

  Running multi_sort_reverse_10 (5 iterations)... Done
  Running multi_sort_reverse_100 (5 iterations)... Done
  Running multi_sort_reverse_1000 (5 iterations)... Done

Size         Avg (ms)   P50 (ms)   P90 (ms)   Min (ms)   Max (ms)
---------- ---------- ---------- ---------- ---------- ----------
10                610        587        699        582        699
100               601        599        612        590        612
1K                665        666        667        661        667

=== Test 4: Double Reverse (should cancel out) ===
Query: source=big5_v2_first100m | sort +`@timestamp` | reverse | reverse | head <size>

  Running double_reverse_10 (5 iterations)... Done
  Running double_reverse_100 (5 iterations)... Done
  Running double_reverse_1000 (5 iterations)... Done

Size         Avg (ms)   P50 (ms)   P90 (ms)   Min (ms)   Max (ms)
---------- ---------- ---------- ---------- ---------- ----------
10                441        423        514        420        514
100               436        437        445        427        445
1K                506        502        521        502        521

=== Test 5: Sort + Reverse with Filter ===
Query: source=big5_v2_first100m | where `host.name` != 'unknown' | sort +`@timestamp` | reverse | head <size>

  Running filter_sort_reverse_10 (5 iterations)... Done
  Running filter_sort_reverse_100 (5 iterations)... Done
  Running filter_sort_reverse_1000 (5 iterations)... Done

Size         Avg (ms)   P50 (ms)   P90 (ms)   Min (ms)   Max (ms)
---------- ---------- ---------- ---------- ---------- ----------
10                532        511        612        507        612
100               518        520        532        507        532
1K                589        587        598        585        598

=== REVERSE COMMAND PERFORMANCE TEST COMPLETE ===

}

@Test
public void testReverseWithTimestampField() throws IOException {
Copy link
Member

@LantaoJin LantaoJin Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some ITs for streamstats? the streamstats always sort on __stream_seq__ to keep the output ordering as same as input's. We'd better add some ITs to verify the pipe streamstats | reverse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added ITs for streamstats, since reverse command only works when there's a detectable collation now, using streamstats only will make reverse command ignored. Do we want to keep it this way?

Copy link
Member

@LantaoJin LantaoJin Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the streamstats contains collation __stream_seq__ ASC

Signed-off-by: Kai Huang <[email protected]>
@ahkcs ahkcs requested review from LantaoJin and noCharger November 14, 2025 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support reverse pushdown with Calcite

4 participants